Final Project DSA 301: Buffalo Crime Incidents

Nguyet Que Tran

Abstract

The crime situation, the safety of the area and the living environment are important issues. Through analyzing and visualizing Buffalo's crime data, the project provides a better view and insight into the current social situation in the area such as common crime types or dates and time frames, or the neigborhood often occurs criminal activities. Not only providing knowledge, this project also helps viewers to wake up and raise their vigilance.

Questions

  1. Which types of crime are mostly happend?
  2. What days crime mostly happened?
  3. What time of day crime mostly happened?
  4. Does crime occur more at weekend/night? What types of crime are happened more at weekend/night?
  5. What area/neighborhoods are high/low rate of crime?
  6. Are there any addresses that have more than 2 crimes occured?

By working with spatial attributions, this project focus on building customized analytical modules for processing and analysis of geospatial data. The goals of this project is to provide information about crime's locations in Buffalo by geospatial mapping such as 2D map, interactive point frequency maps, and interactive point distribution maps.

  1. Mapping crime locations by different conditions such as Crime Types, Neighborhoods.
  2. How are locations and frequencies of theft cases different to location and frequencies of homicide crime cases in Buffalo?
  3. Mapping that showing the number of confirmed crime cases by Buffalo Council Districts? Which Council Districts is most dangeous?

About the Dataset

Source: Buffalo Open Data - Crime Incidents

This dataset is information about crime incidents of Buffalo.

The dataset was created in September 6, 2017 and was updated in February 16, 2022.

There are total 279330 records and 29 attribute fields.

Contents

I. Exploratory Data Analysis

II. Data Cleaning

III. Visualization

IV. Spatial Data

I. Exploratory Data Analysis

In [1]:
!pip install chart_studio
Collecting chart_studio
  Downloading chart_studio-1.1.0-py3-none-any.whl (64 kB)
     |████████████████████████████████| 64 kB 2.3 MB/s 
Requirement already satisfied: plotly in /usr/local/lib/python3.7/dist-packages (from chart_studio) (5.5.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from chart_studio) (1.15.0)
Collecting retrying>=1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from chart_studio) (2.23.0)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from plotly->chart_studio) (8.0.1)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (2021.10.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->chart_studio) (3.0.4)
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... done
  Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11447 sha256=d6ce9da6c206999f79ba899e3db1fa9121a9f1d471540b5ead70cd705fbaa509
  Stored in directory: /root/.cache/pip/wheels/f9/8d/8d/f6af3f7f9eea3553bc2fe6d53e4b287dad18b06a861ac56ddf
Successfully built retrying
Installing collected packages: retrying, chart-studio
Successfully installed chart-studio-1.1.0 retrying-1.3.3
In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from chart_studio import plotly
from chart_studio.plotly import plot, iplot
import plotly.graph_objs as go
import math

As mentioned, there are 29 columns. I just choose to read specific 12 columns that are needed for this project.

In [3]:
# Read the dataset from url, add ?$limit=300000 to read all records
crime_url = 'https://data.buffalony.gov/resource/d6g9-xbgu.csv?$limit=300000'
crime = pd.read_csv(crime_url, usecols=['case_number','incident_datetime','parent_incident_type', 
                                                    'hour_of_day','day_of_week','address_1','city','state','location',
                                                    'latitude','longitude','neighborhood_1'])
crime.head()
Out[3]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1
0 22-0920427 2022-04-02T12:21:49.000 Theft 12 SATURDAY 100 Block LEXINGTON AV Buffalo NY POINT (-78.877 42.913) 42.913 -78.877 Elmwood Bryant
1 22-0930266 2022-04-03T09:43:28.000 Assault 9 SUNDAY 100 Block CRESCENT AV Buffalo NY POINT (-78.849 42.933) 42.933 -78.849 Parkside
2 22-0930128 2022-04-03T02:26:42.000 Assault 3 SUNDAY 400 Block GRIDER ST Buffalo NY POINT (-78.829 42.918) 42.918 -78.829 Delavan Grider
3 22-0930670 2022-04-03T18:45:00.000 Theft 19 SUNDAY 800 Block ABBOTT RD Buffalo NY POINT (-78.809 42.842) 42.842 -78.809 South Park
4 22-0940667 2022-04-04T16:10:00.000 Theft 16 MONDAY 2200 Block DELAWARE AV Buffalo NY NaN NaN NaN NaN
In [4]:
crime.shape
Out[4]:
(280849, 12)
  • The data which use for this project contain 279330 records and 12 attributions.
In [5]:
crime.dtypes
Out[5]:
case_number              object
incident_datetime        object
parent_incident_type     object
hour_of_day               int64
day_of_week              object
address_1                object
city                     object
state                    object
location                 object
latitude                float64
longitude               float64
neighborhood_1           object
dtype: object

Limitation of the dataset: Lacking numerical data.

The only numerical data which are useful and can combine with other data is Hour of Day.

Idealy crime data: contain information about number of injured people, dead people, etc.

Check missing values

In [6]:
# number of missing values in each columns
crime.isnull().sum()
Out[6]:
case_number                0
incident_datetime          5
parent_incident_type       0
hour_of_day                0
day_of_week                5
address_1                 39
city                       0
state                      0
location                1395
latitude                1395
longitude               1395
neighborhood_1           967
dtype: int64

In total 279,677 cases:

  • There are 5 cases that are missed information about Incident Datetime.

  • 39 cases are missed address information.

  • 874 cases are misses neighborhood information.

In [7]:
# Cases that do not have DateTime information
crime[crime['incident_datetime'].isnull()]
Out[7]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1
57423 11-0400654 NaN Breaking & Entering 0 NaN BROADWAY & BAILEY AV BUFFALO NY POINT (0 0) 0.000 0.000 UNKNOWN
74735 10-3130893 NaN Assault 0 NaN 200 Block STEVENSON ST BUFFALO NY POINT (-78.815 42.858) 42.858 -78.815 Seneca-Cazenovia
148138 12-3390923 NaN Assault 0 NaN GRANT ST & AMHERST ST BUFFALO NY POINT (0 0) 0.000 0.000 UNKNOWN
159897 13-0720178 NaN Theft 0 NaN 1 Block PLYMOUTH AV BUFFALO NY POINT (0 0) 0.000 0.000 UNKNOWN
198522 14-0420506 NaN Theft 0 NaN NaN Buffalo NY POINT (0 0) 0.000 0.000 UNKNOWN
In [8]:
crime['parent_incident_type'].value_counts()
Out[8]:
Theft                   122634
Assault                  57703
Breaking & Entering      53211
Theft of Vehicle         22980
Robbery                  18193
Sexual Assault            2497
Other Sexual Offense      2241
Homicide                   957
Sexual Offense             433
Name: parent_incident_type, dtype: int64
In [9]:
crime['day_of_week'].value_counts()
Out[9]:
Friday       38388
Saturday     38088
Wednesday    35851
Monday       35841
Sunday       35644
Thursday     35488
Tuesday      35407
MONDAY        4002
TUESDAY       3885
THURSDAY      3754
FRIDAY        3730
WEDNESDAY     3723
SUNDAY        3532
SATURDAY      3511
Name: day_of_week, dtype: int64
In [10]:
crime['day_of_week'].unique()
Out[10]:
array(['SATURDAY', 'SUNDAY', 'MONDAY', 'WEDNESDAY', 'THURSDAY', 'Tuesday',
       'Friday', 'Sunday', 'Wednesday', 'Thursday', 'Saturday', 'Monday',
       'FRIDAY', 'TUESDAY', nan], dtype=object)
  • There is namning error in column Day of Week. Names of day are duplicated by lower and upper cases.
In [11]:
# Note: UNKNOWN 2877  
crime['neighborhood_1'].value_counts()
Out[11]:
Broadway Fillmore     16102
Central               14969
Kensington-Bailey     14597
North Park            13593
Genesee-Moselle       12749
Schiller Park         11908
Elmwood Bidwell       11729
Elmwood Bryant        11281
Upper West Side       10686
University Heights    10579
West Side             10052
Kenfield              10007
Riverside              9236
Lovejoy                8790
Masten Park            8747
Lower West Side        7723
Hopkins-Tifft          7188
Delavan Grider         7163
Fillmore-Leroy         6652
Allentown              6522
Seneca-Cazenovia       6321
South Park             5924
MLK Park               5864
Parkside               5340
Fruit Belt             5260
West Hertel            5258
Black Rock             4563
Hamlin Park            4485
Pratt-Willert          4239
Grant-Amherst          4102
Ellicott               3720
Kaisertown             3476
Central Park           3314
Seneca Babcock         2991
UNKNOWN                2887
First Ward             1865
Name: neighborhood_1, dtype: int64
In [12]:
crime['hour_of_day'].describe()
Out[12]:
count    280849.000000
mean         11.801253
std           7.364646
min           0.000000
25%           6.000000
50%          12.000000
75%          18.000000
max          23.000000
Name: hour_of_day, dtype: float64
In [13]:
crime['hour_of_day'].unique()
Out[13]:
array([12,  9,  3, 19, 16, 18, 13,  8, 23,  0, 20, 21, 10, 15, 14, 11,  4,
        2,  7,  1,  5, 17, 22,  6])
  • The exact time in Incident Datetime column are converted into only 24 hour in Hour of Day column

Group crime types by neighborhood

In [14]:
pd.set_option('display.max_rows',500)
crime.groupby(['neighborhood_1','parent_incident_type']).size()
Out[14]:
neighborhood_1      parent_incident_type
Allentown           Assault                  901
                    Breaking & Entering      793
                    Homicide                  10
                    Other Sexual Offense      29
                    Robbery                  397
                    Sexual Assault            50
                    Sexual Offense             3
                    Theft                   3874
                    Theft of Vehicle         465
Black Rock          Assault                  931
                    Breaking & Entering      962
                    Homicide                  11
                    Other Sexual Offense      34
                    Robbery                  274
                    Sexual Assault            36
                    Sexual Offense             7
                    Theft                   1902
                    Theft of Vehicle         406
Broadway Fillmore   Assault                 3946
                    Breaking & Entering     3502
                    Homicide                 109
                    Other Sexual Offense     112
                    Robbery                 1383
                    Sexual Assault           160
                    Sexual Offense            16
                    Theft                   5397
                    Theft of Vehicle        1477
Central             Assault                 3188
                    Breaking & Entering     1206
                    Homicide                  22
                    Other Sexual Offense     127
                    Robbery                  761
                    Sexual Assault           183
                    Sexual Offense            21
                    Theft                   8668
                    Theft of Vehicle         793
Central Park        Assault                  461
                    Breaking & Entering      700
                    Homicide                   3
                    Other Sexual Offense      28
                    Robbery                  224
                    Sexual Assault            15
                    Sexual Offense             8
                    Theft                   1618
                    Theft of Vehicle         257
Delavan Grider      Assault                 2129
                    Breaking & Entering     1449
                    Homicide                  52
                    Other Sexual Offense      90
                    Robbery                  531
                    Sexual Assault            94
                    Sexual Offense            27
                    Theft                   2118
                    Theft of Vehicle         673
Ellicott            Assault                  815
                    Breaking & Entering      827
                    Homicide                   5
                    Other Sexual Offense      25
                    Robbery                  230
                    Sexual Assault            36
                    Sexual Offense             8
                    Theft                   1457
                    Theft of Vehicle         317
Elmwood Bidwell     Assault                 1227
                    Breaking & Entering     2138
                    Homicide                  16
                    Other Sexual Offense      82
                    Robbery                  641
                    Sexual Assault            68
                    Sexual Offense            11
                    Theft                   6532
                    Theft of Vehicle        1014
Elmwood Bryant      Assault                 1494
                    Breaking & Entering     1734
                    Homicide                  11
                    Other Sexual Offense      41
                    Robbery                  670
                    Sexual Assault            85
                    Sexual Offense            13
                    Theft                   6317
                    Theft of Vehicle         916
Fillmore-Leroy      Assault                 1780
                    Breaking & Entering     1208
                    Homicide                  41
                    Other Sexual Offense      65
                    Robbery                  515
                    Sexual Assault            85
                    Sexual Offense             7
                    Theft                   2317
                    Theft of Vehicle         634
First Ward          Assault                  450
                    Breaking & Entering      451
                    Homicide                   5
                    Other Sexual Offense      22
                    Robbery                   89
                    Sexual Assault            13
                    Sexual Offense             1
                    Theft                    678
                    Theft of Vehicle         156
Fruit Belt          Assault                 1161
                    Breaking & Entering      738
                    Homicide                  22
                    Other Sexual Offense      46
                    Robbery                  358
                    Sexual Assault            65
                    Sexual Offense            16
                    Theft                   2437
                    Theft of Vehicle         417
Genesee-Moselle     Assault                 3426
                    Breaking & Entering     2966
                    Homicide                 100
                    Other Sexual Offense      99
                    Robbery                 1016
                    Sexual Assault           144
                    Sexual Offense            27
                    Theft                   3826
                    Theft of Vehicle        1145
Grant-Amherst       Assault                  772
                    Breaking & Entering      931
                    Homicide                   7
                    Other Sexual Offense      30
                    Robbery                  255
                    Sexual Assault            25
                    Sexual Offense             5
                    Theft                   1725
                    Theft of Vehicle         352
Hamlin Park         Assault                 1049
                    Breaking & Entering     1043
                    Homicide                  18
                    Other Sexual Offense      42
                    Robbery                  291
                    Sexual Assault            39
                    Sexual Offense             9
                    Theft                   1523
                    Theft of Vehicle         471
Hopkins-Tifft       Assault                 1594
                    Breaking & Entering     1065
                    Homicide                   9
                    Other Sexual Offense      71
                    Robbery                  303
                    Sexual Assault            80
                    Sexual Offense            16
                    Theft                   3516
                    Theft of Vehicle         534
Kaisertown          Assault                  848
                    Breaking & Entering      720
                    Homicide                   8
                    Other Sexual Offense      40
                    Robbery                  123
                    Sexual Assault            29
                    Sexual Offense            10
                    Theft                   1399
                    Theft of Vehicle         299
Kenfield            Assault                 2438
                    Breaking & Entering     2166
                    Homicide                  43
                    Other Sexual Offense     103
                    Robbery                  712
                    Sexual Assault            98
                    Sexual Offense            13
                    Theft                   3447
                    Theft of Vehicle         987
Kensington-Bailey   Assault                 2867
                    Breaking & Entering     3052
                    Homicide                  59
                    Other Sexual Offense     108
                    Robbery                 1060
                    Sexual Assault            85
                    Sexual Offense            15
                    Theft                   6084
                    Theft of Vehicle        1267
Lovejoy             Assault                 2108
                    Breaking & Entering     1833
                    Homicide                  24
                    Other Sexual Offense      63
                    Robbery                  519
                    Sexual Assault            92
                    Sexual Offense            16
                    Theft                   3419
                    Theft of Vehicle         716
Lower West Side     Assault                 1630
                    Breaking & Entering     1230
                    Homicide                  27
                    Other Sexual Offense      61
                    Robbery                  464
                    Sexual Assault            72
                    Sexual Offense             9
                    Theft                   3724
                    Theft of Vehicle         506
MLK Park            Assault                 1692
                    Breaking & Entering     1015
                    Homicide                  39
                    Other Sexual Offense      56
                    Robbery                  450
                    Sexual Assault            73
                    Sexual Offense            14
                    Theft                   1902
                    Theft of Vehicle         623
Masten Park         Assault                 2067
                    Breaking & Entering     1695
                    Homicide                  44
                    Other Sexual Offense      63
                    Robbery                  692
                    Sexual Assault            74
                    Sexual Offense             6
                    Theft                   3287
                    Theft of Vehicle         819
North Park          Assault                 1269
                    Breaking & Entering     1598
                    Homicide                  13
                    Other Sexual Offense      52
                    Robbery                  525
                    Sexual Assault            65
                    Sexual Offense            12
                    Theft                   9258
                    Theft of Vehicle         801
Parkside            Assault                  447
                    Breaking & Entering      966
                    Homicide                   3
                    Other Sexual Offense      21
                    Robbery                  257
                    Sexual Assault            33
                    Sexual Offense             5
                    Theft                   3265
                    Theft of Vehicle         343
Pratt-Willert       Assault                 1071
                    Breaking & Entering      819
                    Homicide                  21
                    Other Sexual Offense      39
                    Robbery                  310
                    Sexual Assault            52
                    Sexual Offense             4
                    Theft                   1551
                    Theft of Vehicle         372
Riverside           Assault                 1962
                    Breaking & Entering     1946
                    Homicide                  31
                    Other Sexual Offense      92
                    Robbery                  595
                    Sexual Assault            87
                    Sexual Offense            20
                    Theft                   3777
                    Theft of Vehicle         726
Schiller Park       Assault                 2970
                    Breaking & Entering     2991
                    Homicide                  61
                    Other Sexual Offense     103
                    Robbery                 1006
                    Sexual Assault            91
                    Sexual Offense            16
                    Theft                   3509
                    Theft of Vehicle        1161
Seneca Babcock      Assault                  698
                    Breaking & Entering      663
                    Homicide                   3
                    Other Sexual Offense      24
                    Robbery                  146
                    Sexual Assault            25
                    Sexual Offense             7
                    Theft                   1144
                    Theft of Vehicle         281
Seneca-Cazenovia    Assault                 1541
                    Breaking & Entering     1173
                    Homicide                  12
                    Other Sexual Offense      78
                    Robbery                  245
                    Sexual Assault            45
                    Sexual Offense            14
                    Theft                   2720
                    Theft of Vehicle         493
South Park          Assault                 1196
                    Breaking & Entering     1059
                    Homicide                   8
                    Other Sexual Offense      57
                    Robbery                  136
                    Sexual Assault            36
                    Sexual Offense             6
                    Theft                   2954
                    Theft of Vehicle         472
UNKNOWN             Assault                  577
                    Breaking & Entering      480
                    Homicide                   3
                    Other Sexual Offense      19
                    Robbery                  182
                    Sexual Assault            25
                    Sexual Offense             1
                    Theft                   1437
                    Theft of Vehicle         163
University Heights  Assault                 1805
                    Breaking & Entering     2547
                    Homicide                  32
                    Other Sexual Offense      78
                    Robbery                  916
                    Sexual Assault            78
                    Sexual Offense            10
                    Theft                   4262
                    Theft of Vehicle         851
Upper West Side     Assault                 1953
                    Breaking & Entering     2385
                    Homicide                  36
                    Other Sexual Offense      85
                    Robbery                  938
                    Sexual Assault           104
                    Sexual Offense            13
                    Theft                   4339
                    Theft of Vehicle         833
West Hertel         Assault                  930
                    Breaking & Entering      837
                    Homicide                  10
                    Other Sexual Offense      71
                    Robbery                  244
                    Sexual Assault            51
                    Sexual Offense            12
                    Theft                   2684
                    Theft of Vehicle         419
West Side           Assault                 2173
                    Breaking & Entering     2265
                    Homicide                  37
                    Other Sexual Offense      85
                    Robbery                  683
                    Sexual Assault           104
                    Sexual Offense            14
                    Theft                   3948
                    Theft of Vehicle         743
dtype: int64

II. Data Cleaning

Drop missing values

In [15]:
crime.dropna(how='any',inplace=True)
crime.shape
Out[15]:
(279411, 12)

Fix naming error of column Day of Week

In [16]:
crime['day_of_week']=crime['day_of_week'].str.upper()
In [17]:
crime['day_of_week'].value_counts()
Out[17]:
FRIDAY       41931
SATURDAY     41434
MONDAY       39604
WEDNESDAY    39350
TUESDAY      39076
THURSDAY     39020
SUNDAY       38996
Name: day_of_week, dtype: int64

III. Visualization

Type of crime incidents

In [18]:
len(crime['parent_incident_type'])
Out[18]:
279411
In [19]:
plt.figure(figsize=(12,5))
chart = sns.countplot(y='parent_incident_type', data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'{chart.get_ylabel().capitalize()}',fontweight='bold')
# add percentages for each bar
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.5,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['parent_incident_type']))),ha='left')
plt.show()
  • 43.56% - Almost haft of recorded crime incidents cases that happened in Buffalo are theft cases.

  • Top crime incidents are theft, assault, and breaking and entering.

Day of Week and Hour of Day

In [20]:
# create function to draw multiple countplots 
def plot_multiple_countplots(crime, cols,num_cols,num_rows, hue=None):
             
    fig, axs = plt.subplots(num_rows, num_cols,figsize=(20, 10))
  
    for index, col in enumerate(cols):
        i = math.floor(index/num_cols)
        j = index - i*num_cols      
        
        if num_rows == 1:
            if num_cols == 1:
                chart = sns.countplot(x=crime[col], ax=axs, hue = hue, palette='Spectral')              
            else:
                chart = sns.countplot(x=crime[col], ax=axs[j],hue = hue, palette='Spectral')                
        else:
            chart = sns.countplot(x=crime[col], ax=axs[i, j],hue = hue, palette='Spectral')         
        # rotate axis labels   
        chart.set_xticklabels(chart.get_xticklabels(), rotation=15, ha ='center')           
        # set names each countplot
        chart.set_title(f'{chart.get_xlabel().capitalize()}',fontweight='bold')              
        # add percentages on top of each bar
        for p in chart.patches:               
            chart.text(p.get_x(),p.get_height()+1,'{:1.2f}%'.format(p.get_height()*100/ float(len(crime[col]))),ha='left')
In [21]:
plot_multiple_countplots(crime, ['day_of_week','hour_of_day'],2,1)

1. Day of week:

  • Friday, Saturday and Sunday are a little more dangerous than other days.

2. Hour of Day:

  • 12am is the time that most likely for crime incidents.
In [22]:
plt.figure(figsize=(17,10))
chart = sns.boxplot(x='parent_incident_type', y='hour_of_day',data=crime, hue ='day_of_week' , palette='Spectral')
chart.set_title(f'Timeline of Incident Types',fontweight='bold')
plt.show()
  • Most crime cases about proverty such as Theft, Theft of Vehicle, Breaking & Entering happpen around 8 a.m and 4 p.m - the time frame of working hours.

  • Cases about interaction conflict such as Assault, Robbery, Sexual Assault and Homicide have a fluctuated time frame.

Does most crime happen at weekend?

In [23]:
crime['Weekend'] = crime['day_of_week'].isin(['SATURDAY', 'SUNDAY'])
ax=sns.catplot(x='parent_incident_type', y='hour_of_day', hue='Weekend', kind='box', dodge=False, data=crime)
ax.fig.suptitle(f'Crime on Weekend',fontweight='bold')
ax.fig.set_size_inches(17,5)
  • The answer is YES. Most crimes occur more frequently at weekend.

Does most crime happen at night time?

In [24]:
x = [0,1,2,3,4,5,6,20,21,22,23]
crime['Night Time'] = crime['hour_of_day'].isin(x)

plt.figure(figsize=(12,7))
chart = sns.countplot(y='parent_incident_type', data=crime,hue='Night Time')
# set name for the plot
chart.set_title(f'Crime at Night',fontweight='bold')
# add percentages for each bar
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.25,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['parent_incident_type']))),ha='left')
plt.show()
  • Night time in this project is from 8 p.m to 6 a.m.

  • Only Theft , Breaking & Entering, and Sexual Offense occur more at day time. Because day time, especically from 8 a.m to 4 p.m is the time frame of office working hours. People leaving for work, stay in the office are good condition for thief and intruder.

  • All other types of crime incident occur more at night time.

Neighborhood and Location

In [25]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'Neighborhood and Crime Cases',fontweight='bold')
for p in chart.patches:
    chart.text(p.get_width(),p.get_y()+0.5,'{:1.2f}%'.format(p.get_width()*100/ float(len(crime['neighborhood_1']))),ha='left')
plt.show()

High frequency of crime - Dangerous Neighborhoods :

  1. Broadway Fillmore

  2. Central

  3. Kensington-Bailey

  4. Noth Park

  5. Genesee-Moselle

Low frequency of crime - Safe Neighborhoods:

  1. First Ward

  2. Seneca Babcock

  3. Central Park

  4. Kaisertown

  5. Ellicott

In [26]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime, hue= 'parent_incident_type',palette='Spectral')
# set name for the plot
chart.set_title(f'Neighborhood with Incident Type',fontweight='bold')
plt.show()
  • North Park is the neighborhood where incident happened in highest frequency and most cases are theft.

  • Neighborhoods that suffered from Theft: Noth Park, Broadway Fillmore, Central, Kensington-Bailey, Elmwood Bidwell and Elmwood Bryant.

WHAT-IF No-Theft?

Because 43.55% recorded cases are theft cases, so to have a closer look in other incident types that happened in different neighborhoods, this step remove all the theft cases.

In [27]:
# Remove all Theft cases
crime2 = crime
crime2 =  crime2[crime2['parent_incident_type'].str.contains('Theft')==False]
# Draw chart
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', data=crime2, hue= 'parent_incident_type',palette='tab10')
# set name for the plot
chart.set_title(f'Neighborhood with Non-Theft Incident Type',fontweight='bold')
plt.show()
  • Without theft cases involved, neighborhoods are suffered from assault, breaking and entering.

  • High frequency of Assault: Broadway Fillmore, Genersee-Moselle, Schiller Park, Central, Kenfield and Delavan Crider.

  • High frequency of Breaking & Entering: Broadway Fillmore, Genersee-Moselle, Schiller Park, Kensington-Bailey, University Heights.

  • High frequency of Robbery: Broadway Fillmore, Genersee-Moselle, and University Heights.

  • Without theft cases involved, North Park is now no longer the most dangerous neighborhood.

In [28]:
plt.figure(figsize=(17,10))
chart = sns.countplot(y='neighborhood_1', hue = 'day_of_week',data=crime,palette='Spectral')
# set name for the plot
chart.set_title(f'Day of Crime in Neighborhood',fontweight='bold')
plt.show()
  • Central neigborhood is more dangrous at weekend.

  • All days of week arlarm: Broadway Fillmore, North Park, Kensington-Bailey, Schiller Park, Genersee-Moselle and Emlwood Bidwell.

Check duplicated addresses

In [29]:
crime.duplicated(subset=['address_1'],keep='first').sum()
Out[29]:
258726
  • It is interesting that total records number is 279330 and duplicated address number is 257356.

=> More than 92% addresses had more than 2 crime cases in records.

In [30]:
crime.loc[crime.duplicated(subset=['address_1'], keep='first'),:]
Out[30]:
case_number incident_datetime parent_incident_type hour_of_day day_of_week address_1 city state location latitude longitude neighborhood_1 Weekend Night Time
8 22-0940226 2022-04-02T23:00:12.000 Theft of Vehicle 8 MONDAY FILLMORE AV & BROADWAY Buffalo NY POINT (-78.839 42.893) 42.893 -78.839 Broadway Fillmore False False
33 10-0850391 2009-09-22T00:00:00.000 Theft 0 TUESDAY 300 Block DORRANCE AV BUFFALO NY POINT (-78.811 42.832) 42.832 -78.811 South Park False True
79 10-2270049 2010-08-15T00:35:00.000 Theft 0 SUNDAY OOJ BUFFALO NY POINT (-78.878 42.886) 42.886 -78.878 Central True True
82 10-1621022 2010-06-11T00:00:00.000 Theft 0 FRIDAY 800 Block ABBOTT RD BUFFALO NY POINT (-78.808 42.841) 42.841 -78.808 South Park False True
85 12-0280814 2012-01-28T00:00:00.000 Theft 0 SATURDAY 1 Block HUMASON AV BUFFALO NY POINT (-78.799 42.914) 42.914 -78.799 Schiller Park True True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
280843 20-0930450 2020-04-01T19:47:00.000 Assault 15 THURSDAY 1500 Block JEFFERSON AV Buffalo NY POINT (-78.855 42.916) 42.916 -78.855 Masten Park False False
280844 21-2411006 2021-08-29T19:30:00.000 Theft 21 SUNDAY 2400 Block DELAWARE AV Buffalo NY POINT (-78.869 42.951) 42.951 -78.869 North Park True True
280845 19-3200783 2019-11-16T19:40:00.000 Robbery 19 SATURDAY 200 Block FRANKLIN ST BUFFALO NY POINT (-78.874 42.891) 42.891 -78.874 Central True False
280846 21-1220796 2021-05-02T19:33:29.000 Assault 19 SUNDAY GOEMBLE AV & WALDEN AL Buffalo NY POINT (-78.816 42.905) 42.905 -78.816 Genesee-Moselle True False
280848 20-0430889 2020-02-12T22:30:00.000 Theft 22 WEDNESDAY 100 Block INDIAN CHURCH RD Buffalo NY POINT (-78.802 42.854) 42.854 -78.802 Seneca-Cazenovia False True

258726 rows × 14 columns

IV. Spatial Data

In [31]:
%%time 

!apt install gdal-bin python-gdal python3-gdal 
# Install rtree - Geopandas requirment
!apt install python3-rtree 
# Install Geopandas
!pip install git+git://github.com/geopandas/geopandas.git
# Install descartes - Geopandas requirment
!pip install descartes
Reading package lists... Done
Building dependency tree       
Reading state information... Done
gdal-bin is already the newest version (2.2.3+dfsg-2).
python-gdal is already the newest version (2.2.3+dfsg-2).
The following additional packages will be installed:
  python3-numpy
Suggested packages:
  python-numpy-doc python3-nose python3-numpy-dbg
The following NEW packages will be installed:
  python3-gdal python3-numpy
0 upgraded, 2 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,288 kB of archives.
After this operation, 13.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-numpy amd64 1:1.13.3-2ubuntu1 [1,943 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-gdal amd64 2.2.3+dfsg-2 [346 kB]
Fetched 2,288 kB in 1s (2,576 kB/s)
Selecting previously unselected package python3-numpy.
(Reading database ... 155455 files and directories currently installed.)
Preparing to unpack .../python3-numpy_1%3a1.13.3-2ubuntu1_amd64.deb ...
Unpacking python3-numpy (1:1.13.3-2ubuntu1) ...
Selecting previously unselected package python3-gdal.
Preparing to unpack .../python3-gdal_2.2.3+dfsg-2_amd64.deb ...
Unpacking python3-gdal (2.2.3+dfsg-2) ...
Setting up python3-numpy (1:1.13.3-2ubuntu1) ...
Setting up python3-gdal (2.2.3+dfsg-2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
  python3-pkg-resources
Suggested packages:
  python3-setuptools
The following NEW packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
  python3-pkg-resources python3-rtree
0 upgraded, 5 newly installed, 0 to remove and 39 not upgraded.
Need to get 671 kB of archives.
After this operation, 3,948 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex4v5 amd64 1.8.5-5 [219 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-c4v5 amd64 1.8.5-5 [51.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-pkg-resources all 39.0.1-2 [98.8 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-dev amd64 1.8.5-5 [285 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-rtree all 0.8.3+ds-1 [16.9 kB]
Fetched 671 kB in 1s (897 kB/s)
Selecting previously unselected package libspatialindex4v5:amd64.
(Reading database ... 155865 files and directories currently installed.)
Preparing to unpack .../libspatialindex4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package libspatialindex-c4v5:amd64.
Preparing to unpack .../libspatialindex-c4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex-c4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package python3-pkg-resources.
Preparing to unpack .../python3-pkg-resources_39.0.1-2_all.deb ...
Unpacking python3-pkg-resources (39.0.1-2) ...
Selecting previously unselected package libspatialindex-dev:amd64.
Preparing to unpack .../libspatialindex-dev_1.8.5-5_amd64.deb ...
Unpacking libspatialindex-dev:amd64 (1.8.5-5) ...
Selecting previously unselected package python3-rtree.
Preparing to unpack .../python3-rtree_0.8.3+ds-1_all.deb ...
Unpacking python3-rtree (0.8.3+ds-1) ...
Setting up libspatialindex4v5:amd64 (1.8.5-5) ...
Setting up python3-pkg-resources (39.0.1-2) ...
Setting up libspatialindex-c4v5:amd64 (1.8.5-5) ...
Setting up libspatialindex-dev:amd64 (1.8.5-5) ...
Setting up python3-rtree (0.8.3+ds-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.3) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

Collecting git+git://github.com/geopandas/geopandas.git
  Cloning git://github.com/geopandas/geopandas.git to /tmp/pip-req-build-nu2ggusq
  Running command git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-nu2ggusq
  fatal: remote error:
    The unauthenticated git protocol on port 9418 is no longer supported.
  Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/geopandas/geopandas.git. Command errored out with exit status 128: git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-nu2ggusq Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-nu2ggusq Check the logs for full command output.
Requirement already satisfied: descartes in /usr/local/lib/python3.7/dist-packages (1.1.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from descartes) (3.2.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (1.4.0)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (1.21.5)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (0.11.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->descartes) (3.0.7)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->descartes) (3.10.0.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->descartes) (1.15.0)
CPU times: user 248 ms, sys: 88.1 ms, total: 336 ms
Wall time: 21.3 s
In [32]:
!pip install geopandas
Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 14.9 MB/s 
Collecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
     |████████████████████████████████| 6.3 MB 45.5 MB/s 
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.3.5)
Collecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
     |████████████████████████████████| 16.7 MB 490 kB/s 
Requirement already satisfied: shapely>=1.6 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.8.1.post1)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (7.1.2)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (57.4.0)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (2021.10.8)
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (21.4.0)
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.15.0)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2.8.2)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (1.21.5)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2018.9)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1
In [33]:
import geopandas as gpd
In [34]:
pd.set_option('display.max_columns',None) 
# Add $limit=300000 to read in all records, defalt is 1000 records.
crime_url = "https://data.buffalony.gov/resource/d6g9-xbgu.geojson?$limit=300000"
crime_gdf = gpd.read_file(crime_url)
#crime_gdf = gpd.read_file(crime_url, ignore_fields=["iso_a3", "gdp_md_est"])
crime_gdf.tail()
Out[34]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week incident_id tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary updated_at case_number census_tract_2010 incident_datetime council_district geometry
280844 Buffalo North Park District D 42.951 Theft NY 360290050001001 SUNDAY None 005000 Buffalo Police are investigating this report o... 36029005000 1001 -78.869 1001 1 1 50 21 2021-08-29T21:40:28 2400 Block DELAWARE AV 360290050001 LARCENY/THEFT None 21-2411006 50 2021-08-29T19:30:00 NORTH POINT (-78.86900 42.95100)
280845 BUFFALO Central District B 42.891 Robbery NY 360290035022019 Saturday None 016500 Buffalo Police are investigating this report o... 36029016500 2019 -78.874 1021 2 1 165 19 2019-11-17T04:03:00 200 Block FRANKLIN ST 360290001102 ROBBERY None 19-3200783 165 2019-11-16T19:40:00 ELLICOTT POINT (-78.87400 42.89100)
280846 Buffalo Genesee-Moselle District C 42.905 Assault NY 360290029002005 SUNDAY None 002900 Buffalo Police are investigating this report o... 36029002900 2005 -78.816 2005 2 2 29 19 2021-05-02T19:34:29 GOEMBLE AV & WALDEN AL 360290029002 ASSAULT None 21-1220796 29 2021-05-02T19:33:29 FILLMORE POINT (-78.81600 42.90500)
280847 Buffalo None None None Theft NY None TUESDAY None None Buffalo Police are investigating this report o... None None None None None None None 18 2021-09-21T18:44:55 0 Block HASTINGS AV None LARCENY/THEFT None 21-2640840 None 2021-09-21T18:30:55 None None
280848 Buffalo Seneca-Cazenovia District A 42.854 Theft NY 360290165002004 WEDNESDAY None 001000 Buffalo Police are investigating this report o... 36029001000 2004 -78.802 2001 2 2 10 22 2020-02-12T22:50:00 100 Block INDIAN CHURCH RD 360290001102 LARCENY/THEFT None 20-0430889 10 2020-02-12T22:30:00 SOUTH POINT (-78.80200 42.85400)
In [35]:
!pip install contextily
Collecting contextily
  Downloading contextily-1.2.0-py3-none-any.whl (16 kB)
Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from contextily) (7.1.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from contextily) (3.2.2)
Collecting xyzservices
  Downloading xyzservices-2022.3.0-py3-none-any.whl (36 kB)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from contextily) (2.23.0)
Collecting mercantile
  Downloading mercantile-1.2.1-py3-none-any.whl (14 kB)
Requirement already satisfied: geopy in /usr/local/lib/python3.7/dist-packages (from contextily) (1.17.0)
Collecting rasterio
  Downloading rasterio-1.2.10-cp37-cp37m-manylinux1_x86_64.whl (19.3 MB)
     |████████████████████████████████| 19.3 MB 1.2 MB/s 
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from contextily) (1.1.0)
Requirement already satisfied: geographiclib<2,>=1.49 in /usr/local/lib/python3.7/dist-packages (from geopy->contextily) (1.52)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.4.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (2.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (3.0.7)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.21.5)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (0.11.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->contextily) (3.10.0.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->contextily) (1.15.0)
Requirement already satisfied: click>=3.0 in /usr/local/lib/python3.7/dist-packages (from mercantile->contextily) (7.1.2)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (0.7.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (57.4.0)
Collecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Collecting affine
  Downloading affine-2.3.1-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (2021.10.8)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (1.1.1)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (21.4.0)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (1.24.3)
Installing collected packages: snuggs, affine, xyzservices, rasterio, mercantile, contextily
Successfully installed affine-2.3.1 contextily-1.2.0 mercantile-1.2.1 rasterio-1.2.10 snuggs-1.4.7 xyzservices-2022.3.0
In [36]:
import contextily as ctx
%matplotlib inline
In [37]:
crime_gdf.drop(['incident_id','updated_at'], axis=1,inplace=True)
crime_gdf.head()
Out[37]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime council_district geometry
0 Buffalo Elmwood Bryant District B 42.913 Theft NY 360290066021002 SATURDAY 006602 Buffalo Police are investigating this report o... 36029006602 1002 -78.877 1001 1 1 66.02 12 2022-04-02T12:22:49 100 Block LEXINGTON AV 360290066021 LARCENY/THEFT 22-0920427 66.02 2022-04-02T12:21:49 NIAGARA POINT (-78.87700 42.91300)
1 Buffalo Parkside District D 42.933 Assault NY 360290052014005 SUNDAY 005201 Buffalo Police are investigating this report o... 36029005201 4005 -78.849 4005 4 4 52.01 9 2022-04-03T09:43:28 100 Block CRESCENT AV 360290052014 ASSAULT 22-0930266 52.01 2022-04-03T09:43:28 DELAWARE POINT (-78.84900 42.93300)
2 Buffalo Delavan Grider District E 42.918 Assault NY 360290034005017 SUNDAY 003400 Buffalo Police are investigating this report o... 36029003400 5017 -78.829 5019 5 5 34 3 2022-04-03T03:26:42 400 Block GRIDER ST 360290034005 ASSAULT 22-0930128 34 2022-04-03T02:26:42 MASTEN POINT (-78.82900 42.91800)
3 Buffalo South Park District A 42.842 Theft NY 360290007005003 SUNDAY 000700 Buffalo Police are investigating this report o... 36029000700 5003 -78.809 5003 5 5 7 19 2022-04-03T19:00:00 800 Block ABBOTT RD 360290007005 LARCENY/THEFT 22-0930670 7 2022-04-03T18:45:00 SOUTH POINT (-78.80900 42.84200)
4 Buffalo None None None Theft NY None MONDAY None Buffalo Police are investigating this report o... None None None None None None None 16 2022-04-04T16:18:59 2200 Block DELAWARE AV None LARCENY/THEFT 22-0940667 None 2022-04-04T16:10:00 None None

Check the Coordinate Reference System(CRS)

Check the Coordinate Reference System(CRS) and change it to epsg:3857 to be able to draw plot.

In [38]:
# Check crs
crime_gdf.crs
Out[38]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [39]:
# Change crs
crime_gdf.to_crs('epsg:3857',inplace=True)

Check and drop missing geometry rows

In [40]:
crime_gdf.shape
Out[40]:
(280849, 27)
In [41]:
orig_rows = crime_gdf.shape[0]
crime_gdf = crime_gdf.loc[crime_gdf.geometry.notnull()]
print(f'Records with missing location information = {orig_rows-crime_gdf.shape[0]}')
Records with missing location information = 1395
In [42]:
#crime_gdf.geometry=crime_gdf.geometry.astype(float)
crime_gdf.dropna(subset =['geometry'], how='any',inplace=True)
#crime_gdf.dropna( how='any',inplace=True)
crime_gdf.shape
Out[42]:
(279454, 27)

Mapping crimes by neighborhoods

In [43]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf['neighborhood_1'], ax=ax);
ax.set_title('Crime Incident Locations of Buffalo by Neighborhood',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

There are some bad geometry data that the locations are not in NY state.

So the map is so big, it is not only Buffalo area.

To solve this problem, fixing it by removing the UNKNOWN council_district.

In [44]:
#crime_gdf = crime_gdf.GeoDataFrame.drop(columns=['incident_id'],  axis=1, inplace=True)
crime_gdf.council_district.unique()
Out[44]:
array(['NIAGARA', 'DELAWARE', 'MASTEN', 'SOUTH', 'FILLMORE', 'ELLICOTT',
       'UNIVERSITY', 'UNKNOWN', 'LOVEJOY', 'NORTH'], dtype=object)
In [45]:
# set council_district as index of the dataframe
crime_gdf.set_index('council_district',inplace=True)
crime_gdf.head()
Out[45]:
city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry
council_district
NIAGARA Buffalo Elmwood Bryant District B 42.913 Theft NY 360290066021002 SATURDAY 006602 Buffalo Police are investigating this report o... 36029006602 1002 -78.877 1001 1 1 66.02 12 2022-04-02T12:22:49 100 Block LEXINGTON AV 360290066021 LARCENY/THEFT 22-0920427 66.02 2022-04-02T12:21:49 POINT (-8780547.475 5298738.921)
DELAWARE Buffalo Parkside District D 42.933 Assault NY 360290052014005 SUNDAY 005201 Buffalo Police are investigating this report o... 36029005201 4005 -78.849 4005 4 4 52.01 9 2022-04-03T09:43:28 100 Block CRESCENT AV 360290052014 ASSAULT 22-0930266 52.01 2022-04-03T09:43:28 POINT (-8777430.530 5301779.318)
MASTEN Buffalo Delavan Grider District E 42.918 Assault NY 360290034005017 SUNDAY 003400 Buffalo Police are investigating this report o... 36029003400 5017 -78.829 5019 5 5 34 3 2022-04-03T03:26:42 400 Block GRIDER ST 360290034005 ASSAULT 22-0930128 34 2022-04-03T02:26:42 POINT (-8775204.140 5299498.928)
SOUTH Buffalo South Park District A 42.842 Theft NY 360290007005003 SUNDAY 000700 Buffalo Police are investigating this report o... 36029000700 5003 -78.809 5003 5 5 7 19 2022-04-03T19:00:00 800 Block ABBOTT RD 360290007005 LARCENY/THEFT 22-0930670 7 2022-04-03T18:45:00 POINT (-8772977.750 5287953.474)
FILLMORE Buffalo Broadway Fillmore District C 42.893 Sexual Offense NY 360290016021004 WEDNESDAY 001602 Buffalo Police are investigating this report o... 36029001602 1004 -78.839 1000 1 1 16.02 18 2022-04-06T18:43:20 FILLMORE AV & BROADWAY 360290016021 RAPE 22-0960847 16 2022-04-06T18:42:20 POINT (-8776317.335 5295699.511)
In [46]:
crime_gdf.drop(['UNKNOWN'] , axis=0,inplace=True)
In [47]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf['neighborhood_1'], ax=ax);
ax.set_title('Crime Incident Locations of Buffalo by Neighborhood',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

Mapping by crime types

In [48]:
crime_gdf.parent_incident_type.unique()
Out[48]:
array(['Theft', 'Assault', 'Sexual Offense', 'Theft of Vehicle',
       'Breaking & Entering', 'Robbery', 'Homicide',
       'Other Sexual Offense', 'Sexual Assault'], dtype=object)
In [49]:
crime_gdf.reset_index(inplace=True)
In [50]:
crime_gdf['conrank'] = 'lightgray'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Theft','conrank']='red'
crime_gdf.loc[crime_gdf.parent_incident_type == 'Assault','conrank']='blue'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Robbery','conrank']='purple'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Theft of Vehicle','conrank']='organce'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Breaking & Entering','conrank']='yellow'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Sexual Offense','conrank']='violet'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Other Sexual Offense','conrank']='brown'
#crime_gdf.loc[crime_gdf.parent_incident_type == 'Sexual Assault','conrank']='lime'
crime_gdf.loc[crime_gdf.parent_incident_type == 'Homicide','conrank']='deepPink'
crime_gdf.loc[~crime_gdf.parent_incident_type.isin(['Homicide','Assault']),'conrank']=='gray'
Out[50]:
0         False
3         False
4         False
5         False
6         False
          ...  
276468    False
276469    False
276471    False
276472    False
276474    False
Name: conrank, Length: 218636, dtype: bool
In [51]:
import matplotlib.lines as mlines

fig, ax = plt.subplots(figsize=(12,12), subplot_kw=dict(aspect='equal'))

deepPink_marker = mlines.Line2D([], [], color='deepPink', marker='.', linestyle='None',
                          markersize=10,label='Homicide')
blue_marker = mlines.Line2D([], [], color='blue', marker='.', linestyle='None',
                          markersize=10,label='Assault')
gray_marker=mlines.Line2D([], [], color='gray', marker='.', linestyle='None',
                          markersize=10, label='Other types')
ax.legend(handles=[deepPink_marker,blue_marker,gray_marker])

crime_gdf.plot(color=crime_gdf['conrank'], ax=ax)
ax.set_title('Buffalo Assault and Homicide Crime Cases',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

Mapping Duplicated locations

In [52]:
# Total duplcatated address here is smaller than above because I did remove some rows that missing geometry 
crime_gdf.duplicated(subset=['address_1'],keep='first').sum()
Out[52]:
256073
In [53]:
fig, ax = plt.subplots(figsize=(15,15), subplot_kw=dict(aspect='equal'))
crime_gdf.plot(column=crime_gdf.duplicated(subset=['address_1'],keep='first'), ax=ax);
ax.set_title('>= 2 Crime Incidents Cases Locations of Buffalo',fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.set_axis_off()
ctx.add_basemap(ax)

Point Frequency Maps

Bokeh

In [54]:
from bokeh.tile_providers import CARTODBPOSITRON, get_provider
tileProvider = get_provider('CARTODBPOSITRON_RETINA')

from bokeh.io import output_notebook, show, output_file, save
from bokeh.plotting import figure
from bokeh.models import HoverTool, GeoJSONDataSource
from bokeh.layouts import row,column
from bokeh.models.widgets import Div

output_notebook()

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
In [55]:
kwargs = {"plot_width":800,
          "plot_height":700,
          "sizing_mode":'scale_both',
          "outline_line_color":'#046626',
          "outline_line_width":3,
          "outline_line_alpha":.3,
          'toolbar_location':'above',
          'border_fill_color':'#4287f5',
          'border_fill_alpha':.3,
          'min_border_left': 20,
          'min_border_right':20,
          'min_border_top': 10,
          'min_border_bottom':20}
In [56]:
# Check null geometry 
orig_rows = crime_gdf.shape[0] 
crime_gdf = crime_gdf.loc[crime_gdf.geometry.notnull()]
print(f'Records with missing location information = {orig_rows-crime_gdf.shape[0]:,.0f}\n\
Percent missing = {((orig_rows-crime_gdf.shape[0])/orig_rows)*100:,.0f}%')
Records with missing location information = 0
Percent missing = 0%

Create a unique keys

This key is combine of Latitude and Longitude of locations that crimes happened more than 1 times

In [57]:
crime_gdf['newLoc'] = crime_gdf.geometry.x.astype(str)+ crime_gdf.geometry.y.astype(str)
In [58]:
numlocs = crime_gdf.newLoc.value_counts().rename_axis('uniquepts').to_frame('counts')
numlocs.head()
Out[58]:
counts
uniquepts
-8773756.9863626515302843.689697811 1026
-8775204.1397429615299498.92780969 1016
-8780658.7947918335304972.796811193 921
-8780213.5168286585305124.894422529 792
-8781104.0727550075295243.684788868 783

At some locations, crime incidents occurred in highly high rate.

In [59]:
crime_gdf.geometry.value_counts().sum()
Out[59]:
276475
In [60]:
# Remove duplicate
uHl = crime_gdf.drop_duplicates(subset='newLoc').reset_index()
uHl.tail()
Out[60]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank newLoc
8539 272804 ELLICOTT Buffalo Fruit Belt District B 42.904 Theft NY 360290165002017 FRIDAY 016801 Buffalo Police are investigating this report o... 36029016801 2017 -78.867 2016 2 2 168.01 9 2020-05-22T09:40:00 0 Block BEST 360290001102 LARCENY/THEFT 20-1430268 168 2020-05-22T08:40:00 POINT (-8779434.280 5297371.065) lightgray -8779434.2803931075297371.064716155
8540 273666 SOUTH Buffalo South Park District A 42.843 Theft NY 360290011003009 SUNDAY 001000 Buffalo Police are investigating this report o... 36029001000 3009 -78.804 3008 3 3 10 14 2020-08-30T14:55:00 200 Block POTTERS RD 360290001103 LARCENY/THEFT 20-2430560 10 2020-08-29T22:00:00 POINT (-8772421.152 5288105.296) lightgray -8772421.152473135288105.2957101185
8541 274366 SOUTH Buffalo Hopkins-Tifft District A 42.843 Theft NY 360290001103028 THURSDAY 000110 Buffalo Police are investigating this report o... 36029000110 3028 -78.834 2000 3 2 1.10 10 2021-01-28T10:02:40 100 Block HOPKINS ST 360290001103 LARCENY/THEFT 21-0280215 1.10 2021-01-27T11:00:00 POINT (-8775760.737 5288105.296) lightgray -8775760.737196935288105.2957101185
8542 275013 MASTEN BUFFALO Kensington-Bailey District E 42.932 Other Sexual Offense NY 360290002004002 Saturday 004200 Buffalo Police are investigating this report o... 36029004200 4002 -78.821 4001 4 4 42 21 2019-10-28T06:20:00 500 Block NORTHUMBERLAND AV 360290002004 SEXUAL ABUSE 19-3000619 42 2019-10-26T21:00:00 POINT (-8774313.584 5301627.274) lightgray -8774313.5838166165301627.2744161
8543 275554 DELAWARE Buffalo North Park District D 42.95 Theft NY 360290035022002 SUNDAY 004902 Buffalo Police are investigating this report o... 36029004902 2002 -78.857 2002 2 2 49.02 16 2020-08-30T16:10:25 100 Block SARANAC AV 360290001102 LARCENY/THEFT 20-2430657 49 2020-08-30T10:00:00 POINT (-8778321.085 5304364.431) lightgray -8778321.0854851735304364.431079145
In [61]:
uHl.parent_incident_type.unique()
Out[61]:
array(['Theft', 'Assault', 'Sexual Offense', 'Breaking & Entering',
       'Theft of Vehicle', 'Robbery', 'Homicide', 'Other Sexual Offense',
       'Sexual Assault'], dtype=object)
In [62]:
allHl = pd.merge(uHl,numlocs,left_on='newLoc',right_on='uniquepts').drop(['newLoc'],axis=1)
print(f'Number of locations: {allHl.shape[0]}\n\
accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of locations: 8544
accounting for 276475 cases of crime incidents in Buffalo
In [63]:
plt.hist(allHl.counts,bins=3)
Out[63]:
(array([8.516e+03, 2.000e+01, 8.000e+00]),
 array([1.00000000e+00, 3.42666667e+02, 6.84333333e+02, 1.02600000e+03]),
 <a list of 3 Patch objects>)

Map

Wondering about locations of theft cases which is the most frequency crime type and homicide cases which is the most dangerous crime type.

In [64]:
# Theft cases 
theftcases = allHl.loc[allHl.parent_incident_type	=='Theft'].copy()
print(f'Number of Theft cases: {theftcases.shape[0]}\n\
Accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of Theft cases: 3530
Accounting for 276475 cases of crime incidents in Buffalo
In [65]:
# Homicide cases 
homicidecases = allHl.loc[allHl.parent_incident_type	=='Homicide'].copy()
print(f'Number of Homicide cases: {homicidecases.shape[0]}\n\
Accounting for {allHl.counts.sum()} cases of crime incidents in Buffalo')
Number of Homicide cases: 24
Accounting for 276475 cases of crime incidents in Buffalo
In [66]:
maxcir = 60
maxcnt = theftcases.counts.max()
theftcases['radius']=(theftcases.counts/maxcnt*maxcir)
theftcases['radius']=theftcases['radius'].astype(float).round().astype(int)
theftcases.head()
Out[66]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts radius
0 0 NIAGARA Buffalo Elmwood Bryant District B 42.913 Theft NY 360290066021002 SATURDAY 006602 Buffalo Police are investigating this report o... 36029006602 1002 -78.877 1001 1 1 66.02 12 2022-04-02T12:22:49 100 Block LEXINGTON AV 360290066021 LARCENY/THEFT 22-0920427 66.02 2022-04-02T12:21:49 POINT (-8780547.475 5298738.921) lightgray 190 11
3 3 SOUTH Buffalo South Park District A 42.842 Theft NY 360290007005003 SUNDAY 000700 Buffalo Police are investigating this report o... 36029000700 5003 -78.809 5003 5 5 7 19 2022-04-03T19:00:00 800 Block ABBOTT RD 360290007005 LARCENY/THEFT 22-0930670 7 2022-04-03T18:45:00 POINT (-8772977.750 5287953.474) lightgray 32 2
5 5 ELLICOTT Buffalo Central District B 42.885 Theft NY 360290165001040 THURSDAY 016500 Buffalo Police are investigating this report o... 36029016500 1040 -78.879 1068 1 1 165 9 2022-04-07T09:16:36 0 Block DELAWARE AV 360290165001 LARCENY/THEFT 22-0970226 165 2022-04-07T09:16:36 POINT (-8780770.114 5294484.023) lightgray 776 45
7 8 SOUTH Buffalo South Park District A 42.836 Theft NY 360290006003003 SUNDAY 000600 Buffalo Police are investigating this report o... 36029000600 3003 -78.816 3003 3 3 6 13 2022-04-03T13:34:24 100 Block MCKINLEY PW 360290006003 LARCENY/THEFT 22-0930428 6 2022-04-02T23:00:00 POINT (-8773756.986 5287042.596) lightgray 10 1
9 10 MASTEN Buffalo Central Park District E 42.942 Theft NY 360290045002005 SATURDAY 004500 Buffalo Police are investigating this report o... 36029004500 2005 -78.832 2005 2 2 45 9 2022-04-02T09:10:27 0 Block MERCER AV 360290045002 LARCENY/THEFT 22-0920277 45 2022-04-02T09:10:27 POINT (-8775538.098 5303147.818) lightgray 34 2
In [67]:
maxcir = 60
maxcnt = homicidecases.counts.max()
homicidecases['radius']=(homicidecases.counts/maxcnt*maxcir)
homicidecases['radius']=homicidecases['radius'].astype(float).round().astype(int)
homicidecases.head()
Out[67]:
index council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank counts radius
313 386 UNIVERSITY Buffalo Kensington-Bailey District E 42.942 Homicide NY 360290043002005 MONDAY 004300 Buffalo Police are investigating this report o... 36029004300 2005 -78.813 2005 2 2 43 22 2022-04-04T22:18:00 3200 Block BAILEY AV 360290043002 MURDER 22-0940993 43 2022-04-04T22:18:00 POINT (-8773423.028 5303147.818) deepPink 84 60
1158 1532 MASTEN BUFFALO MLK Park District C 42.908 Homicide NY 360290165002013 Friday 003502 Buffalo Police are investigating this report o... 36029003502 2013 -78.839 4009 2 4 35.02 21 2009-10-14T01:00:00 1200 Block FILLMORE AV 360290001102 MURDER 09-2820929 35 2009-10-09T21:11:00 POINT (-8776317.335 5297978.976) deepPink 45 32
1826 2594 FILLMORE BUFFALO Broadway Fillmore District C 42.904 Homicide NY 360290025021007 Saturday 003501 Buffalo Police are investigating this report o... 36029003501 1007 -78.831 1007 1 1 35.01 2 2019-09-24T22:49:00 HARMONIA & WALDEN 360290001101 MURDER 06-2100145 35 2006-07-29T02:21:00 POINT (-8775426.779 5297371.065) deepPink 84 60
1995 2909 MASTEN BUFFALO Masten Park District E 42.914 Homicide NY 360290046012000 Monday 016802 Buffalo Police are investigating this report o... 36029016802 2000 -78.855 4000 2 4 168.02 10 2019-09-24T22:46:00 100 Block WELKER ST 360290001102 MURDER 06-3590211 168 2006-12-25T10:21:00 POINT (-8778098.447 5298890.918) deepPink 36 26
2408 3715 MASTEN BUFFALO Delavan Grider District E 42.923 Homicide NY 360290002001009 Thursday 017000 Buffalo Police are investigating this report o... 36029017000 1009 -78.818 1007 1 1 170 20 2019-09-24T17:27:00 1000 Block E DELAVAN AV 360290001101 MURDER 06-0611008 170 2006-03-02T20:26:00 POINT (-8773979.625 5300258.996) deepPink 1 1
In [68]:
theftcases.to_crs('epsg:3857',inplace=True)
homicidecases.to_crs('epsg:3857',inplace=True)
output_file("/content/CrimePointFrequencyMaps.html",
            title="Locations with Frequency Crime Incidents in Buffalo")

f1 = figure(title = "Location of Theft cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs)
f2 = figure(title = "Location of Homicide cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs,
            x_range=f1.x_range,y_range=f1.y_range)

f1.add_tile(tileProvider)
f1.title.text_font_style = 'italic'
f1.title.text_font_size = '14pt'
f1.axis.visible=False 

f2.add_tile(tileProvider)
f2.title.text_font_style = 'italic'
f2.title.text_font_size = '14pt'
f2.axis.visible=False 

point_source_1 = GeoJSONDataSource(geojson=theftcases.to_json())
point_source_2 = GeoJSONDataSource(geojson=homicidecases.to_json())


Circle1=f1.circle('x','y',size='radius',fill_color='blue',line_color='blue',fill_alpha=0.5,source=point_source_1)
Circle2=f2.circle('x','y',size='radius',fill_color='red',line_color='red',fill_alpha=0.5,source=point_source_2)

c_hover= HoverTool(renderers=[Circle1])
c_hover.point_policy = "follow_mouse"
c_hover.tooltips=[("Address","@address_1," "@neighborhood_1"),
                  ("   " , "    "),
                  ("Number of Cases","@counts")]

f1.add_tools(c_hover)

c2_hover= HoverTool(renderers=[Circle2])
c2_hover.point_policy = "follow_mouse"
c2_hover.tooltips=[("Address","@address_1," "@neighborhood_1"),
                  ("   " , "    "),
                  ("Number of Cases","@counts")]

f2.add_tools(c2_hover)

heading = Div(text="""<h1>Point Frequency Maps</h1>\
<p> The two maps below show locations and frequencies of theft and homicide crime cases in Buffalo.\
On the left, proportional point symbols show locations of theft cases and on the right are locations of homicide.</p>\
<p> Use the tools to the right of each map to pan, zoom, etc... \
Hover over a property to see the address and number of cases.</p> \
<p><b><i>Data Source</i></b> =<a href = https://data.cityofnewyork.us/Housing-Development/Housing-Litigations/59kj-x8nc target='_blank'>NYC Open Data.</a></p>.\
<p style="font-size:9px;">Maps created 4/10/2022 by Nguyet Que T. Tran.</p>""", sizing_mode="stretch_both")

layout = column(heading, row(f1,f2),sizing_mode='stretch_both',margin=(5,5,5,5))
show(layout)

Point locations represent where the actual event occurred. This approach is only viable if there are point locations with multiple occurrences of the geographic event under consideration.

The map showing the location of crime incidents that were occured. Each point is geocoded to the actual location of an address/house/store.

The size of the symbol at each point location represents the number of crime that were happened at the location. The higher cases, the larger cicle size.

Point Distribution Map

In [69]:
# Buffalo Council Districts dataset
api_url="https://data.buffalony.gov/resource/u5mx-ugvy.geojson"
cd_gdf=gpd.read_file(api_url)
cd_gdf.tail()
Out[69]:
dist_id dist_name shape_leng objectid_1 geometry
5 9 NIAGARA 0.14931438999999999 9 MULTIPOLYGON (((-78.89588 42.92591, -78.89457 ...
6 7 UNIVERSITY 0.18683203000000001 8 MULTIPOLYGON (((-78.80780 42.95894, -78.80774 ...
7 8 MASTEN 0.20357618 10 MULTIPOLYGON (((-78.82813 42.94033, -78.82812 ...
8 4 SOUTH 0.51072598000000002 5 MULTIPOLYGON (((-78.88394 42.87750, -78.88360 ...
9 2 NORTH 0.20635665 3 MULTIPOLYGON (((-78.89236 42.96107, -78.89061 ...
In [70]:
crime_gdf = crime_gdf.to_crs('epsg:3857')
cd_gdf = cd_gdf.to_crs('epsg:3857')
In [71]:
joindf = gpd.sjoin(crime_gdf,cd_gdf,how='inner',op='intersects')
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2822: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead.
  if self.run_code(code, result):
In [72]:
joindf.tail()
Out[72]:
council_district city neighborhood_1 police_district latitude parent_incident_type state geoid20_block day_of_week tractce20 incident_description geoid20_tract census_block longitude census_block_2010 census_block_group census_block_group_2010 census_tract hour_of_day created_at address_1 geoid20_blockgroup incident_type_primary case_number census_tract_2010 incident_datetime geometry conrank newLoc index_right dist_id dist_name shape_leng objectid_1
262022 FILLMORE BUFFALO Central District A 42.862 Theft NY 360290015001025 Wednesday 000500 Buffalo Police are investigating this report o... 36029000500 1025 -78.867 1034 1 1 5 0 2020-01-14T08:35:00 300 Block OHIO ST 360290001101 LARCENY/THEFT 20-0130379 5 2019-11-13T00:00:00 POINT (-8779434.280 5290990.373) lightgray -8779434.2803931075290990.373048019 2 0 UnAssigned 0.47838895999999997 1
265734 FILLMORE Buffalo Central District A 42.862 Theft NY 360290165001045 SATURDAY 000500 Buffalo Police are investigating this report o... 36029000500 1045 -78.867 1032 1 1 5 8 2020-05-02T08:13:00 300 Block OHIO ST 360290001101 LARCENY/THEFT 20-1230216 5 2020-05-02T00:45:00 POINT (-8779434.280 5290990.373) lightgray -8779434.2803931075290990.373048019 2 0 UnAssigned 0.47838895999999997 1
272676 SOUTH Buffalo South Park District A 42.856 Theft NY 360290046012000 WEDNESDAY 000900 Buffalo Police are investigating this report o... 36029000900 2000 -78.816 2000 2 2 9 15 2020-06-10T15:48:00 100 Block MELROSE ST 360290001102 LARCENY/THEFT 20-1620548 9 2020-06-09T19:00:00 POINT (-8773756.986 5290079.200) lightgray -8773756.9863626515290079.20010121 2 0 UnAssigned 0.47838895999999997 1
274079 SOUTH BUFFALO South Park District A 42.852 Theft of Vehicle NY 360290054003003 Monday 000900 Buffalo Police are investigating this report o... 36029000900 3003 -78.811 3003 3 3 9 12 2019-12-16T23:01:00 500 Block S LEGION DR 360290001103 UUV 19-3500612 9 2019-12-16T12:30:00 POINT (-8773200.389 5289471.801) lightgray -8773200.3889086845289471.800653316 2 0 UnAssigned 0.47838895999999997 1
275968 SOUTH BUFFALO Seneca-Cazenovia District A 42.854 Assault NY 360290002004006 Saturday 001000 Buffalo Police are investigating this report o... 36029001000 4006 -78.813 4006 4 4 10 0 2019-11-23T08:20:00 N LEGION DR & YALE PL 360290002004 ASSAULT 19-3270025 10 2019-11-23T00:40:00 POINT (-8773423.028 5289775.495) blue -8773423.027890275289775.495459603 2 0 UnAssigned 0.47838895999999997 1
In [73]:
joindf['council_district']=joindf.council_district.astype(str)
ct = joindf.copy()
ct = ct.council_district.groupby(joindf['council_district']).count().sort_values(ascending=False)
ctdf=ct.to_frame(name='counts').reset_index()
In [74]:
ctdf.tail()
Out[74]:
council_district counts
4 FILLMORE 31496
5 MASTEN 31142
6 NIAGARA 30502
7 DELAWARE 18526
8 SOUTH 17260
In [75]:
nCases = pd.merge(cd_gdf,ctdf,left_on="dist_name",right_on="council_district")
nCases['centroids'] =nCases['geometry'].centroid
nCases = nCases.set_geometry('centroids')
In [76]:
maxcir = 60
maxcnt = nCases.counts.max()
nCases['radius']=(nCases.counts/maxcnt*maxcir)
nCases['radius']=nCases['radius'].astype(float).round().astype(int)
nCases.head()
Out[76]:
dist_id dist_name shape_leng objectid_1 geometry council_district counts centroids radius
0 5 DELAWARE 0.19087878 6 MULTIPOLYGON (((-8778099.466 5305679.993, -877... DELAWARE 18526 POINT (-8778543.920 5302638.853) 24
1 3 FILLMORE 0.42294144 4 MULTIPOLYGON (((-8773501.858 5299351.099, -877... FILLMORE 31496 POINT (-8776864.657 5294510.894) 42
2 1 ELLICOTT 0.32953199 2 MULTIPOLYGON (((-8778908.981 5299381.803, -877... ELLICOTT 45385 POINT (-8779145.197 5296280.698) 60
3 6 LOVEJOY 0.35163747000000001 7 MULTIPOLYGON (((-8773498.518 5300355.298, -877... LOVEJOY 32503 POINT (-8773355.009 5294709.224) 43
4 9 NIAGARA 0.14931438999999999 9 MULTIPOLYGON (((-8782649.206 5300701.761, -878... NIAGARA 30502 POINT (-8781916.166 5298678.305) 40

Map

In [77]:
output_file("/content/CrimeDistributionMaps.html",
            title="Crime Incidents bu Council Districts in Buffalo")
f1 = figure(title = "Crime incident cases in Buffalo", tools=TOOLS, toolbar_sticky=False,**kwargs)

f1.add_tile(tileProvider)
f1.title.text_font_style = 'italic'
f1.title.text_font_size = '14pt'
f1.axis.visible=False 

TA20 = nCases.drop('geometry',axis=1).copy()


point_source_1 = GeoJSONDataSource(geojson=TA20.to_json())
poly_source = GeoJSONDataSource(geojson=cd_gdf.to_json())

Circle1=f1.circle('x','y',size='radius',fill_color='blue',line_color='blue',fill_alpha=0.5,source=point_source_1)
areas = f1.patches('xs','ys',source=poly_source,name="Council Districts",fill_color=None,fill_alpha=0.6,line_color="black",line_width=0.5)

c_hover= HoverTool(renderers=[Circle1])
c_hover.point_policy = "follow_mouse"
c_hover.tooltips=[
                  ("Council Districts","@dist_name"),
                  ("Number of Cases","@counts")]

f1.add_tools(c_hover)



heading = Div(text="""<h1>Point Distribution Maps</h1>\
<p> The two maps below show locations and distribution of crime incident cases in Buffalo.\
<p> Use the tools to the right of each map to pan, zoom, etc... \
Hover over a property to see the address and number of cases.</p> \
<p><b><i>Data Source</i></b> =<a href = https://data.cityofnewyork.us/Housing-Development/Housing-Litigations/59kj-x8nc target='_blank'>NYC Open Data.</a></p>.\
<p style="font-size:9px;">Maps created 4/10/2022 by Nguyet Que T. Tran.</p>""", sizing_mode="stretch_both")

layout = column(heading, row(f1),sizing_mode='stretch_both',margin=(5,5,5,5))
show(layout)

The map showing the number of confirmed crime cases by Buffalo Council Districts. The center of each council districts polygon boundary is used to represent the total number of confirmed crime cases within each council districts. The higher the number, the larger the circle size.

In summary, Ellicott is the council district that have the highest number of crime cases - 45,318 cases. While Delaware council district have the lowest number of cases - 18,486 cases.